OcrV1, Main, Exploration, bibRecord, 001052

Language Identification in Degraded and Distorted Document Images

Identifieur interne : 001052 ( Main/Exploration ); précédent : 001051; suivant : 001053

Language Identification in Degraded and Distorted Document Images

Auteurs : Shijian Lu [Singapour] ; Lim Tan [Singapour] ; Weihua Huang [Singapour]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.

RBID : ISTEX:56A645066EDD2873E6CCAF30E577B5F784F8EDA1

Abstract

Abstract: This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. ...

Url:

https://api.istex.fr/document/56A645066EDD2873E6CCAF30E577B5F784F8EDA1/fulltext/pdf

DOI: 10.1007/11669487_21

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 001345
to stream Istex, to step Curation: 001266
to stream Istex, to step Checkpoint: 000A25
to stream Main, to step Merge: 001069
to stream Main, to step Curation: 001052

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Language Identification in Degraded and Distorted Document Images</title>
<author><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
</author>
<author><name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
</author>
<author><name sortKey="Huang, Weihua" sort="Huang, Weihua" uniqKey="Huang W" first="Weihua" last="Huang">Weihua Huang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:56A645066EDD2873E6CCAF30E577B5F784F8EDA1</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_21</idno>
<idno type="url">https://api.istex.fr/document/56A645066EDD2873E6CCAF30E577B5F784F8EDA1/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001345</idno>
<idno type="wicri:Area/Istex/Curation">001266</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A25</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Lu S:language:identification:in</idno>
<idno type="wicri:Area/Main/Merge">001069</idno>
<idno type="wicri:Area/Main/Curation">001052</idno>
<idno type="wicri:Area/Main/Exploration">001052</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Language Identification in Degraded and Distorted Document Images</title>
<author><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
<affiliation wicri:level="4"><country xml:lang="fr">Singapour</country>
<wicri:regionArea>School of Computing, National University of Singapore, 117543</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Singapour</country>
</affiliation>
</author>
<author><name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
<affiliation wicri:level="4"><country xml:lang="fr">Singapour</country>
<wicri:regionArea>School of Computing, National University of Singapore, 117543</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Singapour</country>
</affiliation>
</author>
<author><name sortKey="Huang, Weihua" sort="Huang, Weihua" uniqKey="Huang W" first="Weihua" last="Huang">Weihua Huang</name>
<affiliation wicri:level="4"><country xml:lang="fr">Singapour</country>
<wicri:regionArea>School of Computing, National University of Singapore, 117543</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Singapour</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">56A645066EDD2873E6CCAF30E577B5F784F8EDA1</idno>
<idno type="DOI">10.1007/11669487_21</idno>
<idno type="ChapterID">21</idno>
<idno type="ChapterID">Chap21</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. ...</div>
</front>
</TEI>
<affiliations><list><country><li>Singapour</li>
</country>
<orgName><li>Université nationale de Singapour</li>
</orgName>
</list>
<tree><country name="Singapour"><noRegion><name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
</noRegion>
<name sortKey="Huang, Weihua" sort="Huang, Weihua" uniqKey="Huang W" first="Weihua" last="Huang">Weihua Huang</name>
<name sortKey="Huang, Weihua" sort="Huang, Weihua" uniqKey="Huang W" first="Weihua" last="Huang">Weihua Huang</name>
<name sortKey="Lu, Shijian" sort="Lu, Shijian" uniqKey="Lu S" first="Shijian" last="Lu">Shijian Lu</name>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001052 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001052 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:56A645066EDD2873E6CCAF30E577B5F784F8EDA1
   |texte=   Language Identification in Degraded and Distorted Document Images
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Language Identification in Degraded and Distorted Document Images

Language Identification in Degraded and Distorted Document Images

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri